# Code submission for Bayesian Generational Population-based Training (BG-PBT)

## Dependencies

We provide a requirements file at ```requirements.txt``` which may be used as follows:

```
conda create --name bgpbt --file requirements.txt
```

For a CPU-only version, please change ```requirements.txt``` to ```requirements_cpu.txt```.

Particular attention should be paid to Brax -- a package that is still very much under active development. 
Different versions often lead to significant discrepancies in the results - our paper uses the 0.10.0 version.


## Scripts to run experiments in the paper

### Main experiments
Full BGPBT (with architectures) -- note that we have different hyperparameters for Humanoid and Hopper -- see Appendix
for details for this & delete as appropriate for the seed.
```
python3 -m test_scripts.run_pbt -v -e ant --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both  -td 30_000_000
python3 -m test_scripts.run_pbt -v -e halfcheetah --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both  -td 30_000_000
python3 -m test_scripts.run_pbt -v -e humanoid --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both  -td 40_000_000 -de 60_000_000 -md 1
python3 -m test_scripts.run_pbt -v -e hopper --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both  -td 40_000_000 -de 60_000_000 -md 1
python3 -m test_scripts.run_pbt -v -e fetch --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both  -td 30_000_000
python3 -m test_scripts.run_pbt -v -e reacher --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both -td 30_000_000
python3 -m test_scripts.run_pbt -v -e ur5e --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both -td 30_000_000
```
Here we briefly explain the meaning of the most notable flags (full descriptions may be found in ```./test_scripts/run_pbt.py```:

````-e````: environment {ant/halfcheetah/humanoid/hopper/fetch/reacher/ur5e}

```--pop_size```: population size: we use 8 for all experiments, although in the appendix we show the result with 24 agents

```-mp --max_parallel```: maximum parallel agents to **actually** run at the same time up to ''pop_size''. 
This needs to be adjusted based on  the VRAM of your GPU. On a single Nvidia GeForce 3090 with 24 GB of VRAM, 
```-mp=4``` is safe for all experiments except for Humanoid (where 2 is used). Note that a smaller ```-mp```
will lead to slower wall-clock speed, but should not affect the results as the algorithm will
still wait for the entire population to finish before running the next iteration (synchronous).

```-qf --quantile_fraction```: the percentage of agents to be replaced at each iteration.

```-ni```: number of initialising agents

```-o --optimizer```: bgpbt/pbt/pb2


PBT/PB2 baselines (delete where appropriate in the commands below).
The PBT/PB2 implementations are largely lifted (with minor adaptations) from the repository provided by the original authors:
https://github.com/jparkerholder/procgen_autorl

```
python3 -m test_scripts.run_pbt -v -e ant --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e halfcheetah --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e humanoid --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e hopper --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e fetch --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e reacher --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e ur5e --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo
```

The configurations found by RS and BO (using SMAC3) can be found in ```./smac_baselines.py```.
```{env_name}_{nas/no_nas}_{same_resource/full}```
- ```nas/no_nas```: whether to search in joint hyperparameter/architecture space, or on the hyperparameter space only
- ```same_resource/full```: ```same_resource``` is the best config found after 8 random search steps, ```full``` is the one found by running sequential SMAC for full 50 steps.

### Ablation studies

BGPBT without distillation and architecture search
```
python3 -m test_scripts.run_pbt -v -e ant --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e halfcheetah --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e humanoid --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e hopper --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e fetch --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e reacher --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e ur5e --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo
```

**NOTE: ```./data``` and ```./ckpts/``` folders are empty in the OpenReview supplementary materials submissions due to file size limit. You may obtain these in the Anonymous Github repo [here](https://anonymous.4open.science/r/bgpbt-submission-E6CE)**

### Intermediate results and plotting scripts
See ```./data/plot.ipynb``` on how to generate the figures in the paper and the ```./data``` contains the raw csv generated from running the scripts.

### Checkpoints to visualize the policies
We include some checkpoints in ```./ckpts``` and a notebook to quickly render policies found by BGPBT.
Animations & more contents may be found in https://sites.google.com/view/bgpbt
